Table of contents
ExpHunterSuite: Differential Expression Report
Data quality control (QC)
Correlation between samples
Here we show scatterplots comparing expression levels for all genes between the different samples, for i) all controls, ii) all treatment samples and iii) for all samples together.
These plots will only be produced when the total number of samples to compare within a group is less than or equal to 10.
Correlation between control samples
Replicates within the same group tend to have Pearson correlation coefficients >= 0.96. Lower values may indicate problems with the samples.
Correlation between treatment samples
Replicates within the same group tend to have Pearson correlation coefficients >= 0.96. Lower values may indicate problems with the samples.
Correlation between samples: All vs all replicates
Correlation coefficients tend to be slightly higher between replicates from the same group than between
replicates from different groups. If this is not the case, it may indicate mislabelling or other
potential issues.
Heatmap and clustering showing correlation between replicates
BROWN: higher correlation; YELLOW: lower correlation
Overview
This section encompases a general overview of the dimensionality reduction analysis applied.
Inertia
Graphical representation of Principal Components (PCs). The bars represent the percentage of total variance that summarize each PC.
The line measures the percentage of total variance accumulated in PCs. The color distinguishes between significant or no significant PCs.
Only significant PCs will be considered in the following plots.
Sample coordinates
This plot represents the coordinates of the samples on the two first Principal Components (Dim1 and Dim2).
The percentage of explained variance is given between brackets.
This is a simplified plot of the samples displayed in the two main PCs. The color of the samples indicates their experimental condition.
Categorical variables
This section explore the relationship between supplementary categorical variables (-S option), samples and PCs.
Coordinates of categories
This plot represent the coordinates of the samples and supplementary categories on the two first Principal Components (Dim1 and Dim2).
The samples and categories are represented in black and purple, respectively.
The percentage of explained variance is given between brackets.
Comparison of significant dimensions
This plot compare the position of samples and their distribution in the significant PCs.
The color differenciate between the control (red) and treat (blue) samples.
Association between qualitative variables and PCs
This plot represent the asociation between the qualitative variables and PCs. The association value is the R2 value.
The flags represent the significance measured with an analysis of variance where *: 0.01 < P < 0.05; ** 0.001 < P < 0.1; *** P < 0.001.
None of the factors was significantly associated with any PC
Association between categories and PCs
This plot represent the asociation between the categories of the qualitative variables and PCs. The association value is the mean coordinates of the samples within categories in PCs.
The flags represent the significance measured with a Student's t test where *: 0.01 < P < 0.05; ** 0.001 < P < 0.1; *** P < 0.001.
None of the categories was significantly associated with any PC
Any supplementary quantitative variable was included on this analysis.
TOP active quantitative variables
This table summarizes the top 10 active quantitative variables associated with PCs.
None of the factors was significantly associated with any PC
HCPC
This section explore the groups of samples based on Hierarchical Clustering on Principal Components (HCPC) and the relationship of the clusters with supplementary variables.
For the HCPC only the 2 first relevant PCs where used.
Hierarchical clustering of individuals2 significant PCs
This plot represent the dendrogram of HCPC of the individuals. The groups of individuals have different colors and the inertia plot is showed at the top right.
HCPC coordinates
This plot represent the coordinates of the samples in the two main Principal Components.
The percentage of summarized variance is showed between brackets.
The samples are colored by their HCPC cluster.
Relationship between HCPC clusters and supplementary qualitative variables
Fisher's exact test is computed between clusters and experimental treats. Fisher's exact test P values and FDR are showed.
None of the clusters was significantly associated with any experimental group
Visualizing normalization results
These boxplots show the distributions of count data before and after normalization (shown for
normalization methoddefault
Representation of cpm unfiltered data
Before normalization
After normalization
Count metrics by sample ranks
Sample rank versus total counts
Sample rank: the position a sample holds after sorting by total counts.
Statistics of expressed genes
Samples are ranked by total expressed genes. Union of expressed genes represents the cumulative total expressed genes (sum of
all genes expressed in any sample up to current sample, expected to increase with sample rank). Intersection of expressed genes
represents the cumulative intersection of expressed genes (sum of genes expressed in every sample up to current sample, expected
to decrease with sample rank)
Mean count distribution by filter
This plot represents the mean counts distribution per gene, classified by filters
Gene counts variance distribution
Variance of gene counts across samples are represented. Genes with lower variance than selected threshold (dashed grey line) were filtered out.
Samples differences by all counts normalized
All counts were normalizated by default (see options below) algorithm. These counts have been scaled by
log10 and plotted in a heatmap.
Sample differences by total normalized counts
Percentages of reads per sample mapping to the most highly expressed genes
| rownames |
ENSG00000276168 |
ENSG00000275395 |
ENSG00000198804 |
ENSG00000198886 |
ENSG00000198786 |
| opg_1 |
8.057 |
1.651 |
2.119 |
1.029 |
0.996 |
| opg_2 |
7.138 |
1.435 |
3.124 |
1.511 |
1.455 |
| opg_3 |
5.581 |
4.402 |
1.66 |
0.802 |
0.725 |
| opg_4 |
5.772 |
3.786 |
1.674 |
0.84 |
0.766 |
| gtdup_1 |
7.964 |
3.052 |
2.442 |
1.125 |
0.983 |
| gtdup_2 |
9.057 |
4.253 |
3.974 |
2.171 |
1.855 |
| gtdup_3 |
2.298 |
4.13 |
1.764 |
0.78 |
0.709 |
Details of input data
First group of samples (to be referred to as control in the rest of the report)
| Sample Names: |
| opg_1 |
| opg_2 |
| opg_3 |
| opg_4 |
Second group of samples (to be referred to as treatment in the rest of the report)
| Sample Names: |
| gtdup_1 |
| gtdup_2 |
| gtdup_3 |
DEgenes Hunter results
Gene classification by DEgenes Hunter
DEgenes Hunter uses multiple DE detection packages to analyse all genes in the input count table and labels them accordingly.
Note: A positive log fold change shows higher expression in the treatment group; a negative log fold change represents higher expression in the control group.
- Filtered out: Genes discarded during the filtering process as showing no or very low expression.
- Prevalent DEG: Genes considered as differentially expressed (DE) by at least 2 packages, as specified by the `minpack_common` argument.
- Possible DEG: Genes considered DE by at least one of the DE detection packages.
- Not DEG: Genes not considered DE in any package.
This barplot shows the total number of genes passing each stage of analysis - from the total number of genes in the input table of counts, to the genes surviving the expression filter, to the genes detected as DE by one package, to the genes detected by at least 2 packages.
Package DEG detection stats
This is the Venn Diagram of all possible DE genes (DEGs) according to at least one of the
selected DE detection packages
Plot showing variability between different DEG detection methods in terms of logFC calculation
This graph shows logFC calculated (y-axis) for each package (points) and gene (x-axis). Only genes with variability over 0.01 will be plotted. This representation allows to user to observe the behaviour of each DE package and see if one of them has atypical results.
If there are no genes showing sufficient variance in estimated logFC accross methods, no plot will be produced and a warning message will be given.
FDR gene-wise benchmarking
Benchmark of false positive calling:
Boxplot of FDR values among all genes with an FDR <= 0.05 in at least one DE detection package
FDR Volcano Plot showing log 2 fold change vs. FDR
The red horizontal line represents the FDR threshold, which has been set to 0.05
The black lines represent other values.
Overview
This section encompases a general overview of the dimensionality reduction analysis applied.
Inertia
Graphical representation of Principal Components (PCs). The bars represent the percentage of total variance that summarize each PC.
The line measures the percentage of total variance accumulated in PCs. The color distinguishes between significant or no significant PCs.
Only significant PCs will be considered in the following plots.
Sample coordinates
This plot represents the coordinates of the samples on the two first Principal Components (Dim1 and Dim2).
The percentage of explained variance is given between brackets.
This is a simplified plot of the samples displayed in the two main PCs. The color of the samples indicates their experimental condition.
Categorical variables
This section explore the relationship between supplementary categorical variables (-S option), samples and PCs.
Coordinates of categories
This plot represent the coordinates of the samples and supplementary categories on the two first Principal Components (Dim1 and Dim2).
The samples and categories are represented in black and purple, respectively.
The percentage of explained variance is given between brackets.
Comparison of significant dimensions
This plot compare the position of samples and their distribution in the significant PCs.
The color differenciate between the control (red) and treat (blue) samples.
Association between qualitative variables and PCs
This plot represent the asociation between the qualitative variables and PCs. The association value is the R2 value.
The flags represent the significance measured with an analysis of variance where *: 0.01 < P < 0.05; ** 0.001 < P < 0.1; *** P < 0.001.
None of the factors was significantly associated with any PC
Association between categories and PCs
This plot represent the asociation between the categories of the qualitative variables and PCs. The association value is the mean coordinates of the samples within categories in PCs.
The flags represent the significance measured with a Student's t test where *: 0.01 < P < 0.05; ** 0.001 < P < 0.1; *** P < 0.001.
None of the categories was significantly associated with any PC
Any supplementary quantitative variable was included on this analysis.
TOP active quantitative variables
This table summarizes the top 10 active quantitative variables associated with PCs.
None of the factors was significantly associated with any PC
HCPC
This section explore the groups of samples based on Hierarchical Clustering on Principal Components (HCPC) and the relationship of the clusters with supplementary variables.
For the HCPC only the 2 first relevant PCs where used.
Hierarchical clustering of individuals2 significant PCs
This plot represent the dendrogram of HCPC of the individuals. The groups of individuals have different colors and the inertia plot is showed at the top right.
HCPC coordinates
This plot represent the coordinates of the samples in the two main Principal Components.
The percentage of summarized variance is showed between brackets.
The samples are colored by their HCPC cluster.
Relationship between HCPC clusters and supplementary qualitative variables
Fisher's exact test is computed between clusters and experimental treats. Fisher's exact test P values and FDR are showed.
None of the clusters was significantly associated with any experimental group
DEgenes Hunter differential expression analysis results can be found in file Common_results/hunter_results_table.txt
DE detection package-specific results
Various plots specific to each package are shown below:
The effective library size is the factor used by DESeq2 normalization algorithm for each sample. The effective library size must be dependent of raw library size.
DESeq2 normalization effects
This plot compares the effective library size with raw library size
The effective library size is the factor used by DESeq2 normalization algorithm for each sample. The effective library size must be dependent of raw library size.
DESeq2 MA plot
This is the MA plot from DESeq2 package
In DESeq2, the MA-plot (log ratio versus abundance) shows the log2 fold changes are attributable to a given variable over the mean of normalized counts. Points will be colored red if the adjusted Pvalue is less than 0.1. Points which fall out of the window are plotted as open triangles pointing either up or down.
A table containing the DESeq2 DEGs is provided: in Results\_DESeq2/DEgenes\_DESEq2.txt
A table containing the DESeq2 normalized counts is provided in Results\_DESeq2/Normalized\_counts\_DESEq2.txt
Differences between samples by PREVALENT DEGs normalized counts
Counts of prevalent DEGs were normalizated by DESeq2 algorithm. This count were scaled by log10 and plotted in a heatmap.
edgeR MA plot
This is the MA plot from package edgeR
Differential gene expression data can be visualized as MA-plots (log ratio versus abundance) where each dot represents a gene. The differentially expressed genes are colored red and the non-differentially expressed ones are colored black.
A table containing the edgeR DEGs is provided in Results\_edgeR/DEgenes\_edgeR.txt
A table containing the edgeR normalized counts is provided in Results\_edgeR/Normalized\_counts\_edgeR.txt
Detailed package results comparison
This is an advanced section that allows comparing the output of packages unadjusted
for DE analysis. The data shown here do not necessarily reflect biological impact.
P-value distributions
Distributions of p-values, unadjusted and adjusted for multiple testing (FDR)
FDR Correlations
## These last two modules have not been used in years, literally. They might break.
#
#
Values of options used to run DEGenesHunter
First column contains the option names; second column contains the given values for each option in this run.
Values of options used to run DEGenesHunter
First column contains the option names; the second contains the given values for each option in this run
| |
opt |
| input_file |
/Users/marmtnez/Desktop/Master_Bioinfo/TFM/Files/final_counts.txt |
| pseudocounts |
FALSE |
| reads |
2 |
| count_var_quantile |
0 |
| minlibraries |
2 |
| filter_type |
separate |
| output_files |
/Users/marmtnez/Desktop/Master_Bioinfo/TFM/Results/degenes/GTDUP_vs_OPG_degenes |
| p_val_cutoff |
0.05 |
| lfc |
1.5 |
| modules |
DE |
| minpack_common |
2 |
| target_file |
/Users/marmtnez/Desktop/Master_Bioinfo/TFM/Files/gtdup_vs_opg_target.txt |
| model_variables |
|
| numerics_as_factors |
FALSE |
| string_factors |
|
| numeric_factors |
|
| WGCNA_memory |
5000 |
| WGCNA_norm_method |
DESeq2 |
| WGCNA_deepsplit |
2 |
| WGCNA_min_genes_cluster |
20 |
| WGCNA_detectcutHeight |
0.995 |
| WGCNA_mergecutHeight |
0.25 |
| WGCNA_all |
FALSE |
| WGCNA_blockwiseNetworkType |
signed |
| WGCNA_blockwiseTOMType |
signed |
| WGCNA_minCoreKME |
0.7 |
| WGCNA_minKMEtoStay |
0.5 |
| WGCNA_corType |
pearson |
| multifactorial |
|
| help |
FALSE |